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Abstract. In distributed networks, it is often useful for the nodes to be 
aware of dense subgraphs, e.g., such a dense subgraph could reveal dense 
subtructures in otherwise sparse graphs (e.g. the World Wide Web or so- 
cial networks) ; these might reveal community clusters or dense regions for 
possibly maintaining good communication infrastructure. In this work, 
we address the problem of self- awareness of nodes in a dynamic network 
with regards to graph density, i.e., we give distributed algorithms for 
maintaining dense subgraphs that the member nodes are aware of. The 
only knowledge that the nodes need is that of the dynamic diameter D, 
i.e., the maximum number of rounds it takes for a message to traverse the 
dynamic network. For our work, we consider a model where the number 
of nodes are fixed, but a powerful adversary can add or remove a limited 
number of edges from the network at each time step. The communication 
is by broadcast only and follows the CONGEST model. Our algorithms 
are continuously executed on the network, and at any time (after some 
initialization) each node will be aware if it is part (or not) of a particular 
dense subgraph. We give algorithms that (2 + e)-approximate the densest 
subgraph and (3 + e)-approximate the at-least-k- dens est subgraph (for a 
given parameter k). Our algorithms work for a wide range of parameter 
values and run in O(Z) log 1+f n) time. Further, a special case of our re- 
sults also gives the first fully decentralized approximation algorithms for 
densest and at-least-fc-densest subgraph problems for static distributed 
graphs. 



1 Introduction 

Density is a very well studied graph property with a wide range of applications 
stemming from the fact that it is an excellent measure of the strength of inter- 
connectivity between nodes. While several variants of graph density problems 
and algorithms have been explored in the classical setting, there is surprisingly 
little work that addresses this question in the distributed computing framework. 
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This paper focuses on decentralized algorithms for identifying dense subgraphs 
in dynamic networks. 

Finding dense subgraphs has received a great deal of attention in graph 
algorithms literature because of the robustness of the property. The density 
of a subgraph only gradually changes when edges come and go in a network, 
unlike other graph properties such as connectivity that are far more sensitive 
to perturbation. Density measures the strength of a set of nodes by the graph 
induced on them from the overall structure. The power of density lies in locally 
observing the strength of any set of nodes, large or small, independent of the 
entire network. 

Dense sugraphs often give key information about the network structure, its 
evolution and dynamics. To quote [22 1: "Dense subgraph extraction is therefore 
a key primitive for any in-depth study of the nature of a large graph". Often, 
dense subgraphs may reveal information about community structure in other- 
wise sparse graphs e.g. the World Wide Web or social networks. They are good 
structures for studying the dynamics of a network and have been used, for ex- 
ample, to study link spams [22]. It is also possible to imagine a scenario where 
a dynamically evolving peer-to-peer network may want to route traffic through 
the densest parts of its network to ease congestion; thus, these subgraphs could 
form the basis of an efficient communication backbone (in combination with 
other subgraphs selected using appropriate centrality measures). 

In this paper, we expand the static CONGEST model [41] and consider a 
dynamic setting where the graph edges may change continually. We present al- 
gorithms for approximating the (at least size k) densest subgraph in a dynamic 
graph model to within constant factors. Our algorithms are not only designed 
to compute size-constrained dense subgraphs, but also track or maintain them 
through time, thereby allowing the network to be aware of dense subgraphs 
even as the network changes. They are fully decentralized and adapt well to 
rapid network failures or modifications. This gives the densest subgraph prob- 
lem a special status among global graph problems: while most graph problems 
are hard to approximate in o(y / n) time even on static distributed networks of 
small diameters [13 38 20J, the densest subgraph problem can be approximated 
in polylogarithmic time (in terms of n) for small £>, even in dynamic networks. 

We now explain our model for dynamic networks, define density objective s 
considered in this paper, and state our results. 

Distributed Computing Model. Consider an undirected, unweighted, con- 
nected n-node graph G = (V,E). Suppose that every node (vertex) hosts a pro- 
cessor with unbounded computational power (though our algorithms only use 
time and space polynomial in n at each vertex), but with only local knowledge 
initially. We assume that nodes have unique identifiers. The nodes may accept 
some additional inputs as specified by the problem at hand. The communication 
is synchronous, and occurs in discrete pulses, called rounds. Further, nodes can 
send messages to each of their neighbors in every round. In our model, all the 
nodes wake up simultaneously at the beginning of round 1. In each round each 
node v is allowed to send an arbitrary message subject to the bandwidth con- 



straint of size 0(log n) bits through any edge e = (v, u) that is adjacent to v, and 
these messages will arrive at each corresponding neighbor at the end of the cur- 
rent round. Our model is akin to the standard model of distributed computation 
known as the CONGEST model gT]. The message size constraint of CONGEST 
is very important for large-scale resource-constrained dynamic networks where 
running time is crucial. 

Edge-Dynamic Network Model. We use the edge deletion/addition model; 
i.e., we consider a sequence of (undirected) graphs Go, G\, . . . on n nodes, where, 
for any t, Gt denotes the state of the dynamic network G(V, E) at time t, where 
the adversary deletes and/or inserts upto r edges at each step, i.e., E(Gt+i) — 
(E(G t )\Eu) U E v , where E v C E(G t ) and E v C E(G t ), \E V \ + \E V \ < r 
(where Gt is the complement graph oi Gt). The edge change rate is denoted by 
the parameter r. 

Following the notion in [33J , we define the dynamic diameter of the dynamic 
network G(V,E), denoted by D, to be the maximum time a message needs to 
traverse the network at any time. More formally, dynamic diameter is defined as 
follows: 

Definition 1 (Dynamic Diameter (Adapted from |33j, Definition 3)). 

We say that the dynamic network G = (V, E) has a dynamic diameter of D upto 
time t if D is the smallest positive integer such that, for all t' < t and u, v € V, 
we have (u,max{0,t' — D}) ■>*» {v,t'), where, for each pair of vertices x,y and 
times ti < t2, (x,ti) ~^ (j/,^) means that at time t2 node y can receive direct 
information, through a chain of messages, originating from node x at time t\. 

Note that the nodes do not need to know the exact dynamic diameter D but 
only a (loose) approximation to it. For simplicity, we assume henceforth that the 
nodes know the exact value of D. 

There are several measures of efficiency of distributed algorithms, but we will 
concentrate on one of them, specifically, the running time, that is, the number 
of rounds of distributed communication. (Note that the computation that is 
performed by the nodes locally is "free", i.e., it does not affect the number of 
rounds.) 

We are interested in algorithms that can compute and maintain an approx- 
imate (at-least-A;) densest subgraph of the network at all times, after a short 
initialization time. We say that an algorithm can compute and maintain a solu- 
tion P in time T if it can compute the solution in T rounds and can maintain a 
solution at all times after time T, even as the network changes dynamically. 

1.1 Problem definition 

Let G — (V, E) be an undirected graph and S C V be a set of nodes. Let us 
define the following: 

Graph Density. The density of a graph G(V,E) is defined as |£'|/|V|. 

SubGraph Density. The density of a subgraph defined by a subset of nodes S 
of V(G) is defined as the density of the induced subgraph. We will use p(S) to 



Fig. 1. The distributed Edge Insert and Delete Model. 



Each node of Go is a processor. 

Each processor starts with a list of its neighbors in Go . 

Pre-processing: Processors may exchange messages with their neighbors. 

for t := 1 to T do 

Adversary deletes and/or inserts upto r edges at each step i.e. E(Gt+\) = 

(E(G t ) \ Eu) U E v , where E v C E(G t ) and E v C E(G~ t ) (where ~G~ t is the 
complement graph of G t ). 

if edge (it, v) is inserted or edge (u, v) is deleted then 

Nodes u and v may update their information and exchange messages 
with their neighbors. 
Computation phase: 

Nodes may communicate (synchronously, in parallel) with their immedi- 
ate neighbors. These messages are never lost or corrupted, may contain 
the names of other vertices, and are received by the end of this phase, 
end if 

At the end of this phase, we call the graph G t . 
end for 

Success metrics: 

1. Approximate Dense Subgraphs: Graph S' T : The induced graph of a 
set S' T C V T , s.t., p{S' T ) > where S T C V, s.t., p(S T ) = maxp(5' T ) 
over all St Q Vt- 

2. Approximate at-least-k-Dense Subgraphs: Graph S T : The induced 
graph of a set 5"= C V,\S k \ > k, s.t., p(S k ) > e@p-, where S k * C 

V, \S k * \ > k, s.t., p(S k *) = maxp(S') over all S C V, \S\ > k. 

3. Communication per edge. The maximum number of bits sent across 
a single edge in a single recovery round. O(logn) in CONGEST model. 

4. Computation time. The maximum total time (rounds) for all nodes 
to compute their density estimations starting from scratch assuming it 
takes a message no more than 1 time unit to traverse any edge and we 
have unlimited local computational power at each node. 



denote the density of the subgraph induced by S. Therefore, p(S) — rg| ■ Here 
E(S) is the subset of edges (u, v) of E where u £ S and v £ S. In particular, 
when talking about the density of a subgraph defined by a set of vertices S 
induced on G, we use the notation po{S). We also use pt(S) to denote pc t (S). 
When clear from context, we omit the subscript G. 

The problem we address in this paper is to construct distributed algorithms 
to discover the following: 

— (Approximate) Densest subgraphs: The densest subgraph problem is to 
find a set S* C V, s.t. p(S*) = max p(S) over all S C V. A a-approximate 
solution S' will be a set S' C V, s.t. p(S') > 

— (Approximate) At-least-/c-densest subgraphs: The densest at-least-fc- 
subgraph problem is the previous problem restricted to sets of size at least 
k, i.e., to find a set S k * C V, \S k *\ > k, s.t. p{S k *) = maxp(5) over all 



S CV, \S\ > k. A a-approximate solution S k will be a set S k C V, \S k \ > fc, 
s.t. p{S k ) > fiifi. 

In the distributed setting, we require that every node knows whether it is in 
the solution S' or S k or not. We note that the latter problem is NP-Complete, 
and thus it is crucial to consider approximation algorithms. The former problem 
can be solved exactly in polynomial time in the centralized setting, and it is an 
interesting open problem whether there is an exact distributed algorithm that 
runs in 0(D poly log n) time, even in static networks. 

1.2 Our Results 

We give approximation algorithms for the densest and at-least-fc-densest sub- 
graph problems which are efficient even on dynamic distributed networks. In par- 
ticular, we develop an algorithm that, for a fixed constant c and any e > 0, (2+e)- 
approximates the densest subgraph in 0(D log 1+e n) time provided that the 
densest subgraph has high density, i.e., it has a density at least (cDr \ogn)/e (re- 
call that r and D are the change rate and dynamic diameter of dynamic networks, 
respectively) . We also develop a (3 + e)-approximation algorithm for the at-least- 
fc-densest subgraph problem with the same running time, provided that the value 
of the density of the at-least-fc-densest subgraph is at least (cDr log n)/ke. We 
state these theorems in a simplified form and some corollaries below. Below, e 
can be set as any arbitrarily small constant. We note again that at the end of 
our algorithms, every node knows whether they are in the returned subgraph or 
not. 

Theorem 2. There exists a distributed algorithm that for any dynamic graph 
with dynamic diameter D and parameter r returns a subgraph at time t such 
that, w.h.p., the density of the returned subgraph is a (2 + e)- approximation to 
the density of the densest subgraph at time t if the densest subgraph has density 
at least f2(Dr\ogn). 

Theorem 3. There exists a distributed algorithm that for any dynamic graph 
with dynamic diameter D and parameter r returns a subgraph of size at least 
k at time t such that, w.h.p., the density of the returned subgraph is a (3 + e)- 
approximation to the density of the densest at least k subgraph at time t if the 
densest at least k subgraph has density at least Q{Dr\ogn/k). 

We mention two special cases of these theorems informally below. We prove 
the most general theorem statements depending on the parameters r and D in 
Section [3] 

Corollary 4. Given a dynamic graph with dynamic diameter O(logn) and a 
rate of change r — 0(\og a n) for some constant a (i.e. r is poly-logarithmic 
in n), there is a distributed algorithm that at any time t can return, w.h.p., a 
(2 + e)- approximation of densest subgraph at time t if the densest subgraph has 
density at time t at least J7(log Q+2 n). 



Corollary 5. Given a dynamic graph with dynamic diameter O(logn) and a 
rate of change r = 0(log a n) for some constant a (i.e. r is poly-logarithmic 
in n), there is a distributed algorithm that at any time t can return, w.h.p., a 
(3 + e)- approximation of k- densest subgraph at time t if the k-densest subgraph 
has density at time t at least J7(log Q+2 n/k). 

Our algorithms follow the main ideas of centralized approximation algorithms 
[29 4I11| . These centralized algorithms cannot be efficiently implemented even 
on static distributed networks. We show how some ideas of these algorithms 
can be turned into time-efficient distributed algorithms with a small increase in 
the approximation guarantees. Similar ideas have been independently discovered 
and used to obtain efficient streaming and MapReduce algorithms by Bahmani 
et al. [8]. 

Notice that this is already a wide range of parameter values for which our 
results are interesting, since the density of densest subgraphs can be as large as 
Q{n) while the diameter in peer-to-peer networks is typically O(logn), and the 
parameter r depends on the stability of the network. A caveat, though, is that in 
the theorems above, D refers to the flooding time of the dynamic network, and 
not the diameter of any specific snapshot - understanding a relationship between 
these quantities remains open. 

Further, our general theorems also imply the following for static graphs (by 
simply setting r = 0) . No such results were known in the distributed setting even 
for static graphs. 

Corollary 6. In a static graph, there is a distributed algorithm that obtains, 
w.h.p., (2 + ^-approximation to the densest subgraph problem in O(Dlogn) 
rounds of the CONGEST model. 

Corollary 7. In a static graph, there is a distributed algorithm that obtains, 
w.h.p, (3 + e)- approximation to the k-densest subgraph problem in O(Dlogn) 
rounds of the CONGEST model. 

Notice that this is an unconditional guarantee for static graphs (i.e. does not 
require any bound on the density of the optimal) and is the first distributed 
algorithm for these problems in the CONGEST model. 

Back to dynamic graphs, in addition to computing the (2 + e)-approximated 
densest and (3 + e)-approximated at-least-fc-densest subgraphs, our algorithm 
can also maintain them at all times with high probability. This means that, at 
all times (except for a short initialization period) , all nodes are aware of whether 
they are part of the approximated at-least-fc densest subgraphs, for all k. 

Even though we assume that all the nodes know the value D, all our algo- 
rithms work if some upper-bound D 1 of D is known instead; all the algorithms 
and analysis work identically using D' rather than D. 

Organization. Our algorithms are described in Section[5]and the approximation 
guarantees are proved in Section [3] We mention related work at the end of the 
paper in Section H) 



2 Algorithm 



2.1 Main Algorithm 

The nature of our algorithm is such that we continuously maintain an approx- 
imation to the densest subgraph in the dynamic network. At any time, after a 
short initialization period, any node knows whether it is a member of the out- 
put subgraph of our algorithm. In this section, we give the description of the 
algorithm and fully specify the behavior of each of the nodes in the network. 
The running time analysis and the approximation guarantees are deferred to the 
following sections. 

Our main protocol for maintaining a dense subgraph is given in Algorithm [TJ 
It maintains a family of p = 0(log 1+e n) candidates for the densest subgraph T = 
{Vq, Vi, . . . , V p }, where Vq = V(G), Vi C Vi-i for all i, along with an approxima- 
tion of the number of nodes and edges in each graph 1Z = {(mo, no), . . . , (m p , n p )}, 
where each mi and n, are the approximate number of edges and nodes, respec- 
tively, of the subgraph of G* (the current graph) induced by Vi- The algorithm 
works in phases in which it estimates the size of the current subgraph Vj and the 
number of edges in it using the algorithms discussed in the following subsection. 
At the end of the phase it computes the next subgraph V}+i using a criterion 
in Line 9 of Algorithm [JJ (explained further in Section After p such rounds, 
the algorithm has all the information it needs to output an approximation to 
the densest subgraph. This process is repeated continuously, and the solution is 
computed from the last complete family of graphs (i.e., complete computation 
of p subgraphs). 

At any time, the densest subgraph can be computed using the steps outlined 
in Algorithm O This procedure works simply by picking the subgraph with the 
highest density, even if the size of this subgraph is less than k. If the graph 
turns out to be less than size k, we pad it by having the rest of the nodes run a 
distributed procedure to elect appropriately many nodes to add to the subgraph 
and get its size up to at least k. 

Any time a densest subgraph query is initiated in the network, the nodes 
simply run Algorithm [2] based on the subgraphs continuously being maintained 
by Algorithm [TJ and compute which of them are in the approximation solution. 
At the end of this query, each node is aware of whether it is in the approximate 
densest subgraph or not. 

2.2 Approximating the number of nodes and edges 

Our algorithms make use of an operation in which the number of nodes and 
edges in a given subgraph need to be computed. We just mention the algorithm 
idea here and present the detailed algorithm in Appendix [X] 

Algorithm Approx-Size-Estimation. We achieve this in 0(D) rounds using 
a modified version of an algorithm from [52 ■ Their algorithm allows for approxi- 
mate counting of the size of a dynamic network with high probability. We modify 
it to work for any subgraph that we are interested in. We also show how it can 



Input: 1 > e > 

Output: The algorithm maintains a family of sets of nodes T = {Vo, V\, ■ ■ ■ , V p } and 
induced graph sizes 1Z — {(mo, no), (mi, m), . . . , (m p , n p )}. 
1: Let 8 = e/24 

2: Let j = 0. Let Vo = V (i.e., we mark every node as in Vo). 
3: repeat 

4: Compute n,-, a (1 + <5)-approximation of \Vj | (i.e., (1 + <5)|Vj| > nj > (l — 5)\Vj\). 

At the end of this step every node knows Uj. See Algorithms and U for detailed 

implementation. 
5: if rij — then 

6: Let j — 0. (Note that we do not recompute no.) 
7: end if 

8: Let Gt be the network at the beginning of this step. Let Ht be the subgraph of 
Gt induced by Vj. We compute mj, the (1 + ^-approximation of the number of 
edges in H t (i.e., (1 + S)\E(H t )\ > mj > (1 - S)\E(H t )\). At the end of this step 
every node knows rrij. See Algorithm [S] for detailed implementation. 
9: Let G t > be the network at the beginning of this step. Let H t i be the subgraph 
of G t ' induced by Vj. Let Vj+i be the set of nodes in Vj whose degree in H t i is 
at least (1 + S)mj/rij. At the end of this step, every node knows whether it is in 
Vj+i or not. 

10: Let j = j + 1. 

11: until forever 

Algorithm 1: Maintain (e) 

be used to approximate the number of edges in this subgraph at a given time. 
In the interest of space, these results can be found in Appendix [X] described 
under algorithms RandomixedApproximateCounting, Count Nodes, and 
Count Edges. 

3 Analysis 

We analyze approximation ratios of the algorithm presented in Section [21 the 
guarantee depending on parameters of the algorithm. We divide the analysis 
into two parts: the first part is for the densest subgraph problem and the second 
for the at-least-fc densest subgraph problem. Although the second part subsumes 
the first part (if we ignore the value of constant approximation ratio) , we present 
the first part since it has a simpler idea and a better approximation ratio. 

3.1 Analysis for the densest subgraph problem 

Theorem 8. Let t be the time Algorithm^ finishes, Vj be the output of the algo- 
rithm, H* be the optimal solution andT be the time of one round of Algorithm]]] 
and\^ (i.e., T — cD\og 1+t n for some constant c). If pt{H*) > 24Tr/e then 
Algorithm^ gives, w.h.p., a (2 + e)- approximation, i.e., 



Pt (V i )>p t (H*)/(2 + e) 



Input: k, the parameter for the densest at-least-fc subgraph problem, the algorithm 
Maintain (e) (cf. Algorithm [1]) , and its parameter notations. 

Output: The algorithm outputs a set of nodes Vi U V (every node knows whether it 
is in the set or not) such that \Vi U V\ > k. 

1: Let i = max^m;/ max(fc, Hi) . 

2: if n % < (1 + S)k then 

3: Let A = (1 + S)k — rii. (Every node can compute A locally.) 
4: repeat 

5: Every node not in Vi locally flips a coin which is head with probability A /no. 
6: Let V be the set of nodes whose coins return heads. 

7: Approximately count the number of nodes in V using the algorithm Approx- 
Size-Estimation discussed in Section [2.21 with error parameter S passed to 
Count Edges under it. Let A' be the result returned. (Note that A' /(l+S) < 
\V\ < (1 + 5)A' w.h.p.) 

8: until (1 + S)A < A' < (1 + 28)A 

9: end if 

10: return Vi U V 

Algorithm 2: Densest Subgraph(A;) 



The rest of this subsection is devoted to proving the above theorem. Let t, 
Vi and H* be as in the theorem statement (note that V in Algorithm [5] is empty 
when k = 0). Let t' be the time that Vi is last computed by Algorithm [1] Let 
t" be the time Algorithm [T] starts counting the number of edges in Vi . We prove 
the theorem using the following lemmas. The main idea is to first lower bound 
Pt" (Vi) using p t > (H*) and then use it to obtain a lower bound for p t ' (Vi) in terms 
of p t (H*). Finally, the proof is completed by lower bounding pt(Vi) in terms of 

pAV). 

Lemma 9. p t >>(V % ) > ^=^ p v (H*). 

Proof. Let H 1 be the densest subgraph of Gt> . Note that 

pAh*)<pAh>). (i) 

Let i* be the smallest index such that V(H') C Vi* and V(H') % V t * +1 . Note 
that i* exists since the algorithm repeats until we get Vj = 0. Let v be any vertex 
in V(H') \ Vi*. Let H t '.i be the subgraph of Gt> induced by nodes in Vi. Note 
that 

pt, (H 1 ) < 2 deg H , (v) < 2 deg Htl i (v) . (2) 

The first inequality is because we can otherwise remove v from H' and get a 
subgraph of Gf that has a higher density than H'. The second inequality is 
because H' C Ht\i. Since v is removed from Vi* , 

deg fft ,»<(l + ^, (3) 

4 '* Hi* 



where 6 — e/24 as in Algorithm [T] By the definition of Vi, 

m,» rtii , , 

— < — • (4) 

Note that t — £" < T by the definition of T. Note also that n t > (l-S)\Vi\ and 
rrii < (1 + S)\E t "(Vi)\ with high probability. It follows that 

rii l — o 

Combining Eq.®-©, we get pt'(H*) < 2^rf-Pt"(V) and thus the lemma. 

We now make the following observation: 
Observation 10 p v {H*) > (1 - 5)p t (H*) . 

Proof. Note that t-t' <T and thus E t (H*) - E t ,(H*) < Tr. Since p t (iJ*) > 
Tr/S, Pt ,(H*) > Pt[H ']^ [ H H :/- Tr > Pt{H*) Tr > (1 - S)p t (H*) . 

We now combine the above Lemma |9] and Observation [10] to obtain the 
following lemma: 

Lemma 11. ^(VQ) > - o> t (#*) . 

Proof. By directly combining Lemma[9]and Observation[Tn]we get the following: 

Moreover, observe that there are at most Tr edges removed from Vi in total, i.e., 
E v ,{y^-Et{V i )<Tr.Thxa 

(v s . Pt-m-m-Tr f 2(1 + S) 2 S \ 



/ 2(l+^£ \ / (1-oT \ / (1-J) 2 



' " -f- ^ ' (l-o") 2 

,2(1+^ 

We are now ready to prove the theorem. 

Proof (Proof of Theorem^. Note that t-t' < T and thus E t ,(Vi)-E t (Vi) < Tr. 
Note that p t ,{Vi) > (3p t {H*) > fiTr/S, where f3 = - 8. We have 

PtOi) > |^j > Pt'(Vi) - Tr > (1 - -Jpt'W)- 

Now using Lemma [TT] and the value of /?, we get the following: 
Pt(Vi) > (1 - ^Wt(H*) = ((3- 5) Pt {H*) = ( ^"^2 - 2(5 ) 

The theorem follows by observing that 2(1+ 1) 2 ~ ^ — 2T7 ^ or an y 6 — 1 an d 
5 > e/24. 



3.2 Analysis for the at-least-fc densest subgraph problem 

Theorem 12. Let t be the time Algorithm® finishes, ViUV be the output of 
the algorithm, if* be the optimal solution and T be the time of one iteration 
of Algorithm^ and Algorithm® (so T = 0{Dlog 1+e n)). If kp t {H*) > 247Y/e 
then Algorithm® returns a set Vi U V of size at least k that is, w.h.p., a (3 + e)- 
approximated solution, i.e., 

Pt{ViUV) >p t (ff*)/(3 + e). 

The proof of this theorem is placed in Appendix [Bj and we just mention the 
main idea here. The proof follows a similar framework as that of Theorem |5J 

Let t, Vi and if* be as in the theorem statement. Let t' be the time that Vi 
is last computed by Algorithm [1] Let t" be the time Algorithm [1] starts counting 
the number of edges in Vi . The crucial difference here is to obtain a strong lower 
bound for p t » ( Vi U V) in terms of pt> (if* ) and pt (if* ) • This is then translated to 
a lower bound on pt'(Vi U V) and subsequently pt{Vi U V) to complete the proof. 
The crucial lemma and its proof turn out to be more involved than that of the 
densest subgraph theorem and the case-based analysis is detailed in Appendix [Bl 

3.3 Running Time Analysis 

In this section we analyze the time that it takes for the nodes to generate an 
approximation to the densest subgraph. Algorithm Q] continuously runs this pro- 
cedure so that it always maintains an approximation that is guaranteed to be 
near-optimal since we assume that the network does not change too quickly. The 
time that it takes for Algorithm [1] to compute a complete family of subgraphs 
is simply O(Dp) = 0(D\og 1+e n) since there are p — 0(log 1+c n) rounds (Sec- 
tion ^. ip . each of which is completed in 0(D) time ('Section l2.2[) . Note that step 
9 of Algorithm [T] can be done in a single round since every node already knows 
rrij/rij and can easily check, in one round, the number of neighbors in G# that 
are in Vj. 

When the nodes need to compute an approximation to the at-least-fc-densest 
subgraph in Algorithm^ they can do so by choosing the densest subgraph among 
the last complete family of subgraphs found by Algorithm[TJ Unfortunately, there 
is no guarantee that the densest such graph has at least k nodes in it, so we fix 
this via padding. The subgraph is padded to contain at least k nodes by having 
each node that is not part of the subgraph attempt to join the subgraph with 
an appropriate probability. It can be shown via Chernoff bounds that, with high 
probability, within O(logn) such attempts there are enough nodes added to the 
subgraph to get its size to at least k. As a result, Algorithm [5] runs in 0(D log n) 
time. 

4 Related Work 

The problem of finding size-bounded densest subgraphs has been studied ex- 
tensively in the classical setting. Finding a maximum density subgraph in an 



undirected graph can be solved in polynomial time [23135) . However, the prob- 
lem becomes NP-hard when a size restriction is enforced. In particular, finding 
a maximum density subgraph of size exactly k is NP-hard |6I19| and no approx- 
imation scheme exists under a reasonable complexity assumption |28j . Recently 
Bhaskara et al. [TU] showed integrality gaps for SDP relaxations of this problem. 
Khullcr and Saha [29J considered the problem of finding densest subgraphs with 
size restrictions and showed that these are NP-hard. Khuller and Saha [29J and 
also Andersen and Chellapilla |4J gave constant factor approximation algorithms. 
Some of our algorithms are based on of those presented in [29] . 

Our work differs from the above mentioned ones in that we address the 
issues in a dynamic setting, i.e., where edges of the network change over time. 
Dynamic network topology and fault tolerance have always been core concerns of 
distributed computing [7136] . There are many models and a large volume of work 
in this area. A notable recent model is the dynamic graph model introduced by 
Kuhn, Lynch and Oshman in [32 . They introduced a stability property called 
T-interval connectivity (for T > 1) which stipulates the existence of a stable 
connected spanning subgraph for every T rounds. Though our models are not 
fully comparable (we allow our networks to get temporarily disconnected as long 
as messages eventually make their way through it), the graphs generated by our 
model are similar to theirs except for our limited rate of churn. They show that 
they can determine the size of the network in 0(n 2 ) rounds and also give a 
method for approximate counting. We differ in that our bounds are sublinear in 
n (when D is small) and we maintain our dense graphs at all times. 

We work under the well-studied CONGEST model (see, e.g., gT] and the 
references therein). Because of its realistic communication restrictions, there 
has been much research done in this model (e.g., sec [36 41 39J). In particular, 
there has been much work done in designing very fast distributed approximation 
algorithms (that are even faster at the cost of producing sub-optimal solutions) 
for many fundamental problems (see, e.g., j!7116|26|27j ). Among many graph 
problems studied, the densest subgraph problem falls into the "global problem" 
category where it seems that one needs at least Q(D) rounds to compute or 
approximate (since one needs to at least know the number of nodes in the graph 
in order to compute the density). While most results we are aware of in this 
category were shown to have a lower bound of f2(y/n/ logn), even on graphs with 
small diameter (see [13] and references therein), the densest subgraph problem 
is one example for which this lower bound does not hold. 

Our algorithm requires certain size estimation algorithms as a subroutine. 
An important tool that also addresses network size estimation is a Controller. 
Controllers were introduced in [lj and they were implemented on 'growing' trees, 
but this was later extended to a more general dynamic model |30|18| . Network 
size estimation itself is a fundamental problem in the distributed setting and 
closely related to other problems like leader election. For anonymous networks 
and under some reasonable assumptions, exact size estimation was shown to be 
impossible [12] as was leader election [5] (using symmetry concerns) . Since then, 
many probabilistic estimation techniques have been proposed using exponen- 



tial and geometric distributions [3213137) . Of course, the problem is even more 
challenging in the dynamic setting. 

Self-* systems |9ll4ll5l3ll34l42l2ll40l24l25155] are worth mentioning here. 
Often, a crucial condition for such systems is the initial detection of a particular 
state. In this respect, our algorithm can be viewed as a self-aware algorithm 
where the nodes monitor their state with respect to the environment, and this 
could be used for developing powerful self-* algorithms. 

5 Future Work and Conclusions 

We have presented efficient decentralized algorithms for finding dense subgraphs 
in distributed dynamic networks. Our algorithms not only show how to com- 
pute size-constrained dense subgraphs with provable approximation guarantees, 
but also show how these can be maintained over time. While there has been 
significant research on several variants of the dense subgraph computation prob- 
lem in the classical setting, to the best of our knowledge this is the first formal 
treatment of this problem for a distributed peer-to-peer network model. 

Several directions for future research result naturally out of our work. The 
first specific question is whether our algorithms and analyses can be improved 
to guarantee 0(D + \ogn) rounds instead of 0(D\ogn), even in static networks. 
Alternatively, can one show a lower bound of Q{D\ogn) in static networks? 
Bounding the value D in terms of the instantaneous graphs and change rate r 
would also be an interesting direction of future work. It is also interesting to show 
whether the densest subgraph problem can be solved exactly in 0(D poly log n) 
or not in the static setting, and to develop dynamic algorithms without density 
lower bound assumptions. Another open problem (suggested to us by David 
Peleg) that seems to be much harder is the at-most-k densest subgraph problem. 
One could also consider various other definitions of density and study distributed 
algorithms for them, as well as explore whether any of these techniques extend 
directly or indirectly to specific applications. Finally, it would be interesting to 
extend our results from the edge alteration model to allow node alterations as 
well. 
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Appendix 

A Counting the number of nodes and edges in a subgraph 



Our algorithms make use of an operation in which all the nodes (edges) in a 
given subgraph need to be counted for different phases of the algorithm. We 



achieve this by using the node- counting algorithm of Kuhn et al. [351 Algorithm 
2] that gives a (1 ± e)-approximation of the number of nodes in a network. There 
are, however, several modifications that have to be made to their algorithm to 
work in our setting, and we describe these next. 

For completeness, our modified version of Kuhn et al.'s algorithm is given 
in Algorithm [3] Note that their algorithm requires an upper bound on the size 
of the network (N), the very quantity that we are estimating. We later give an 
algorithm that can provide this upper bound, thereby removing this assump- 
tion. This algorithm works by generating a number of independent exponential 
variables at each node and using the fact that the minimum of such quanti- 
ties gives a means for estimating their cardinality. The first change we make is 
that we do not have the entire network run this algorithm in a given phase, 
but only the nodes in the current subgraph (denoted here as V). Though all 
the nodes in the nework take part in the computation, only the nodes in V' 
generate exponentially distributed values and hence the final estimate is for this 
subgraph. Secondly, we change the termination condition of the algorithm. The 
algorithm of Kuhn et al. terminated when a reasonable estimate was reached at 
each node. Since in our context we have a bound on the number of rounds it 
takes a message to traverse the network (the dynamic diameter D), we simply 
run for this many rounds and are guaranteed that by the end of D rounds all 
the nodes have the same minimum values. The proof that this algorithm gives 
a (1 ± e)-approximation with high probability is nearly identical to that in |32J, 
and is hence omitted here. 



Input: A set of nodes V' C V (each node knows whether it is in V' or not), dynamic 
diameter D, and error parameter e. 

Output: n, a (1 ± e)-approximation of the number of nodes in 
V. 

1: Let c > and let N be an upper bound on the size of the network 
2: Let I = [27(2 + 2c) log N/e 2 ] 

3: Each node u £ V generates an Z-tuple of independent exponential variables 
with rate f: Z u = (F", . . . , Y t u ); all other nodes v G V - V generate Z v = 
(oo, 00, . . . , do). 

4: for r = 1, . . . , D do 

5: Broadcast Z u if Z u =fi (oo, oo, . . . , oo). 

6: Receive Z V1 , . . . , Z Vs from neighbors. 

7: for i = l,...,/ do 

8: Z? = mm{Z?,Z^,...,Z:°} 

9: end for 
10: end for 

11: Output nu = l/T,l=i Z i- 

Algorithm 3: [32J Randomized ApproximateCounting(V' , D, e) 

As was noted above, the algorithm of Kuhn et al. needs an upper bound N 
on the size of the network. In Algorithm |4l we give an algorithm that provides 



this upper bound (indeed, a 2-approximation) using a similar technique. It does 
not assume that nodes have unique IDs nor does it need an upper bound on the 
size of V, but it does need to know the dynamic diameter D. The algorithm is 
similar to the ELECT algorithm in [2J, except that we use it here to estimate 
the size of a set of nodes in a dynamic network rather than elect a leader in 
a static one. This algorithm uses the maximum (rather than the minimum) of 
discrete (rather than real-valued) independent exponentially distributed values. 



Input: A set of nodes V' (each node knows whether it is in V' or not), dynamic 


diameter D, and a failure probability 5. 


Output: n' , a (2, <5)-approximation of the number of nodes in V' (i.e., if the number 


of nodes in V is n, then P(n/2 < n' < 2n) > 1- 5) 


1 


Let / = 65 In (1/5) 


2 


for i — 1, . . . , I do 


3 


Each node v £ V' tosses an unbiased coin until it sees a head. Let X% be the 




number of tosses it performs. 


4 


end for 


5 


for r = 1, . . . , D do 


6 


Broadcast X v to all of its neighbors. 


7 


Receive X V1 , . . . , X Vs from neighbors. 


8 


for i = l,...,/ do 


9 


X?=max{Xt,XZ\...,X?} 


10 


end for 


11 


end for 


12 


Output the median of (2 x i , . . . , 2 x i). 



Algorithm 4: Count Nodes(I^', D, 5) 



The estimation guarantee of the algorithm is given by the following theorem: 

Theorem 13 (Approximation guarantee). After D rounds, all the nodes 
have the same estimate of n — |V'| and this estimate n' is such that P(n/2 < 
n' < 2n) > 1 - 5. 

Proof. Consider any one coordinate of the /-tuple, say i. After D rounds, by the 
definition of dynamic diameter, all the values X? have been transmitted to all 
the nodes, and so they all have the same maximum value. We show that 2 Xi is 
a good approximation of n. 

For an arbitrary X", we have a cumulative distribution function of P(Xf < 
k) = (1 — l/2 k ). Hence, the cumulative distribution function of X max = max v( zytX\ 
is P(X ma x < k) = (1 — l/2 k ) n . From this we can compute the probability: 

P(lgn - 1 < X max <\gn + l)=P (X max < lgn + 1) -P(X max < lgn - 2) 

=(^)"-H)" 

> 1/2 (forn>4). 



Hence, n/2 < 2 Xmax < 2n with probability greater than 1/2. Using standard 
ChernofT bound techniques, it is easy to show that taking the median of I = 
0(ln (1/6)) such estimates reduces the failure probability down to 5. 

Note that since all the nodes know the value of D, Algorithm U takes pre- 
cisely D rounds to execute. Also note that the maximum number of bits that 
a node has to transmit per round is not too high. We can bound X max to 
within O(logn) with high probability, and so no node communications more 
than 0(log (1/8) log log n) bits in a given round with high probability. 

In summary, in each phase of our algorithm we use Algorithm U to get an 
upper bound on the size of V' (with high probability) and then apply the modi- 
fied algorithm of Kuhn et al. (Algorithm [5]) to get a (1 ± e)-approximation of the 
size of V using the upper bound from the previous algorithm, all in precisely 
2D rounds of communication. 

We next discuss how the number of edges in the induced subgraph is com- 
puted. The algorithm for counting edges is based on the one for counting the 
number of nodes: each node in the subgraph u counts its degree d u and simulates 
the behavior of Algorithms [5] and [3] with d u independent copies of the exponen- 
tially distributed tuples. This increases the computation cost at each node by 
a d u factor, but doesn't affect the number of rounds for the above algorithms. 
Also note that since the component-wise max or min of the tuples is all that 
gets transmitted, there is no increase in the amount of data being broadcast by 
each node. At the end of the computation, the nodes have an estimate of two 
times the number of edges in the subgraph (since both nodes at the end of an 
edge report it). The details are given in Algorithm [5] 



Input: A set of nodes V' (each node knows whether it is in V' or not) and number 
e > 0. 

Output: The algorithm computes m', a (1 ± e)-approximation to the number of edges 
in V. 

1: Every node in V' broadcasts a message to its neighbors. 

2: Each node u counts the number of neighbors in V' that communicated with it, call 
this d u . 

3: Algorithm [4] is run, with each node u simulating d u separate nodes, to get an upper 
bound on J2 U d u- 

4: Algorithm |3] is run, with each node u simulating d u separate nodes, to get a (1 ± e) 

estimate of "}2 u d u , call it m! . 
5: Output m'/2 

Algorithm 5: Count Edges(V", e) 

The analysis of the approximation guarantee for the number of edges is almost 
identical to that for the number of nodes, and is omitted here. Counting the 
number of edges also takes 2D rounds in total, with no node broadcasting more 
than O(logn) bits in any round with high probability. 



B Proof of Theorem H21 

Lemma 14. p t »(Vi U V") > mm (^S^p-, p t ,(H*) - 3Sp t (H*fj . 

Proof. Let if' be the at-least-fc densest subgraph of G# ■ Note that 

pAh*)<pAh'). (6) 

Now, define £, if 1 , . . . , H e and f) using Algorithm [()] (which is similar to the 
process defined in [53] to prove that the algorithm in [29J is a 2-approximation). 
We note that we are not interested in the efficiency of this algorithm as it is only 
used to prove the approximation guarantee. 



1: Let j = 0, G°i = G t > and D = 0. For any set of vertices X, let E t /(X) be the set 

of edges in the subgraph of G t / induced by X. 
2: while \D\ <k/{l-S) or \E t ,{D) n E t ,(H')\ < \E t ,{H') do 
3: For any j, let H 3 be the densest subgraph of Gi,. 
4: D = DUV(H J ). 

5: Let Gj, +1 be the graph obtained from Gi, by deleting nodes in H 3 \ 
6: j = j + 1. 
7: end while 
8: Let£ = j-1. 

Algorithm 6: Defining I, if 1 , . . . , H e and D for the proof of Lemma IT4l 

Note the following simple observation: 
Observation 15 For all j = 1, . . . ,£, p t >(H j ) > \fH>{H'). 
Proof. Since \E t '(D) n E t '(H')\ < \E V {H') in every iteration of the while loop, 

\E t , (v(Gi,)nV(H'))\>~\E t ,(H>)\. 

That is, there are at least 2/3 fraction of edges of if' left in G\,. This implies 
that the density of subgraph of G\< induced by nodes in if' is at least 

p v \ V(G t ,) n V(H )j - lv{Gli)nvmi * 3 W ~ 3^ } ' 
Since if- 7 is the densest subgraph of G\, , 

p f '(if J ) > pv (v(G\,) n 7(ff')) > \ P f{H') 

as claimed. 



Let i* be the smallest index such that V(D) C Vi* and V(D) <2 Vi*+\. Note 
that i* exists since the algorithm repeats until we get Vj — 0. Now we consider 
two cases. 

Case 1: m* > k. Let v be any vertex in V(D) \ V^»_|_i. Let j* be such that 
v € Villi ). Note that Observation [TBI implies that 

Pt>{H')<\ Pt ,{W*). (7) 

Let H t >,i* be the subgraph of Gf induced by vertices in Vi*. Note that 

p v ) < 2 deg K3 -. (v) < 2 deg Ht , 4 , («) . (8) 

The first inequality is because we can remove v from H J and get a subgraph 
of G\, that has higher density than fP otherwise. The second inequality is 
because C Hf t i* (since V(H : > ) C D C Vi*). Since i> is removed from Vi", 

dflg Hi , »<(! + *) — (9) 

where 6 — e/24 as in Algorithm [T] By definition of i and the fact that rii* > fe, 

— — = < . (10) 

rii* max(fc,rii») max(fc, rii) 

Note that \V U V\ < m/(l - 5) and rrn < (1 + 5)\E t »(Vi)\ with high probability. 
It follows that 

m ; 1 -\- 6 

rPt"(ViUV). (11) 



max(fc, rii) 1 



Combining Eq.© with ©-(HJ), we get p v {H*) < 3^£-pt»(Vi U V) and thus 
the lemma. 

Case 2: rii* < k. This implies that with high probability \Vi*\ < (1 + S)k. 
Since D C Vi* , |Z)| < (1 + <5)fc. By the condition in the while loop of Algorithm[6j 

\E t ,(Vi*)\ >\E V {D)\>\\E V {H')\. (12) 

Note that rrn* > \E t ,(Vi*)\ -Tr> \\E t ,{H')\ - 5kp t (H*). Thus, 



???.; 



max(fc, rii* ) 
By Eq.©, 



77? ■ 777 * i [ 

> 7} \ > sPfiH') + 5p t (H*) > -p t >(H*) - Sp t (H*) . (14) 



max(fc, rii) max(fc, n^*) 3 3 

Note that \V t UV\< k/(l - S) and m t < (1 + 8)\E t „(Vi)\ with high probability. 
It follows that 

Combining Eq.©, $Q and (15]), we get ^/(^ uf)> ^^gffi^ 
and thus the lemma. 



B.l Proof of Observation [T5l 



Proof. Since \E t '(D) n E V {H')\ < \E t -{H') in every iteration of the while loop, 

\E V (v(Gi)nV(H'))\>~\E t ,(H')\. 

That is, there are at least 2/3 fraction of edges of H' left in G\,. This implies 
that the density of subgraph of G\, induced by nodes in H' is at least 

Pt- (V(G 3 t ,) n V(H')) = i ; >— 

> 2\EAHJ 
~ 3 \V(H')\ 

Since Hi is the densest subgraph of G\, , 

P AH>) > Pv (v(Gi) n V{H')) > \ Pt -{H') 

as claimed. 

Proof (Proof of Theorem \lty) . The theorem follows directly by using Lemma [H] 
and translating the lower bound on pt"(Vi U V) to a lower bound on pt{Vi U V) 
(similar to the steps in proof of Theorem [SJ. As can be seen, the Lemma [TH has 

a factor ^fe^ as compared to a similar term of 2(1+6)'* ^ n case ^ 0T densest 
subgraph. This is why we are only able to obtain a (3 + e)-approximation to 
this theorem rather than a (2 + e)-approximation previously. The proof for the 
theorem and the (3 + e)-approximation is completed as before by translating 
pt"(Vi U V) to a lower bound on pt(Vi U V) and subsequently from the 
term by plugging in the appropriate value for i5 in terms of e. 



